Statement of Contribution

There are two steps of the shared group strategy: each member should try to solve the problems for themselves, potential discussions are encouraged to inspire learning process; the report is generated and merged in principle presenting the more appropriate solutions out of members’. Generally, both members are capable of delivering their own solutions and equally contributed in the group report.

The same strategy was implemented in report revision, where discussion between members and with classmates comprised the main part of problem identification. In this report, Jun Li has major responsibility for assignment 2, while Fahed Maqbool for assignment 1.

Assignment 1: Network visualization of terrorist connections

part 1

There are two main clusters which are centered by Mohamed Chaoui and Semaan Gaby Eid. In particular, Mohamed Chaoui and Jamal Zougam have the most connections to other members, and both has participated the Bombing. At the same time, both have strong direct tie with Amer Azizi, Imad Eddin Barakat and Galeb Karaj, who do not participated Bombing, which suggests that these three people may be the potential threat in the future.

part 2

Based on path length of 2 of selected node, we found that both Mohamed Chaoui and Jamal Zougam have the largest network, and they have the best opportunity to spread the information in the network. Google shows that they conducted train bombings in Madrid 2004, which killed 202 people and wounded 1500.

part 3

There are 15 clusters and 6 unconnected nodes in the network, which covers and further divides the two clusters mentioned in step 1.

part 4

The up-right corner of heatmap shows the significant cluster where the widely connection and strong band is shared through network. The key figures mentioned in step 1 and 3 such as Mohamed Chaoui, Jamal Zougam, Amer Azizi, Imad Eddin Barakat and Galeb Karaj are all covered in the cluster.

Assignment 2: Animations of time series data

part 1

Most countries increases the consumption of coal, and particularly China, US and India increases in both coal and oil. However there is decreasing trend of coal in France and UK, which suggests that they have done a good job in protecting the environment.

part 2

US and China have the similar motion patterns, according to the motion chart below, China surpassed US in Coal consumption in 1982 and increases dramatically after Millennium due to the rapid development.

part 3

Proportion of oil consumption increases in most countries except stablizing in US and decreasing in Japan. This method is hard to compare similar values when the range is too large, but easier to interpret. However, when time varies then it is hard to grasp interaction between countries. Moreover, absolute changes are not shown when consumption increases in both oil and coal.

part 4

It gets easier to compare between countries, but still a bit annoying when it rebounds, also will readers tend to drop the development trend of a country.

part 5

The most compact and well-separated clusters are found at step 13.3, where the clusters are separated into two year ranges, before and after year 1991. China, Germany and United Kingdom are the three variables which contributes the most, which can be explained by the significant rising consumption in China, as shown in the time-series plot.

## [1] "The optimal projection matrix is: "
##         [,1]    [,2]
## [1,]  0.8418  0.0271
## [2,]  0.0271  0.8762
## [3,] -0.0509  0.0270
## [4,]  0.1659 -0.3967
## [5,]  0.2374 -0.0891
## [6,] -0.4445 -0.0937
## [7,]  0.0605  0.1586
## [8,]  0.0535 -0.1775

Appendix code

knitr::opts_chunk$set(eval=TRUE,echo=F,
                      message = F, warning = F, error = F,
                      fig.width=7, fig.height=5, fig.align="center")
RNGversion('3.5.1')
library(ggplot2)
library(plotly)

library(ggraph)
library(igraph)
library(visNetwork)

library(seriation)

library(tourr)
## part 1.1
nodes<-read.delim("trainMeta.dat",header=F,sep=" ")
links<-read.delim("trainData.dat",header=F,sep=" ")

colnames(nodes)<-c("label","group")
nodes$id<-c(1:nrow(nodes))
links<-links[,-1]
colnames(links)<-c("from","to","width")
links$width<-links$width*5

g<-graph_from_data_frame(d=links, vertices=NULL, directed=F) ## prepare graph for strength()
lines<-strength(g)
nodes$value<-0
nodes$value[as.numeric(names(lines))]<-lines*5

set.seed(1234)
net<-visNetwork(nodes,links) %>% 
  visPhysics(solver="repulsion") %>% #optimizes repulsion forces
  visLegend() 

net %>% visOptions(highlightNearest = list(enabled = TRUE, degree = 0)) 
                   #highlighted path with path length 1

## part 1.2
net %>% visOptions(highlightNearest = list(enabled = TRUE, degree = 1)) 
## part 1.3
nodes1<-nodes
g <- graph_from_data_frame(d=links,directed=F)
ceb <-cluster_edge_betweenness(g) 

indi<-as.numeric(ceb$names)  ## those who have connections
commu<-ceb$membership  ## corresponding communities 

nodes1$group[indi]<-commu ## assign communities to group
commu_num<-length(levels(factor(commu)))  
nodes1$group[-indi]<-c((commu_num+1):(commu_num+nrow(nodes)-length(indi)))
                         ## those not included in any communities have own group
visNetwork(nodes1,links) %>% visIgraphLayout() #%>% visLegend()
## part 1.4
gm <- get.adjacency(g, attr="width", sparse=F)
colnames(gm) <- nodes$label[as.numeric(names(V(g)))]
rownames(gm) <- nodes$label[as.numeric(names(V(g)))]

rowdist<-dist(gm)

order1<-seriate(rowdist, "HC")
ord1<-get_order(order1)

reordmatr<-gm[ord1,ord1]

library(plotly)

plot_ly(z=~reordmatr, x=~colnames(reordmatr), 
        y=~rownames(reordmatr), type="heatmap")
## part 2.1
da<-read.csv2("Oilcoal.csv")
da<-da[,-6]

da %>%
  plot_ly(x = ~Coal, y = ~Oil,size = ~Marker.size, 
          color = ~Country, frame = ~Year, text = ~Country, 
          hoverinfo = "text",type = 'scatter',mode = 'markers') %>% 
  #layout(xaxis=list(type="log")) %>% 
  animation_opts(1000, easing ="elastic",redraw = FALSE) %>% 
  #animation_button(x = 1, xanchor = "right", y = 0, yanchor = "bottom") %>% 
  animation_slider(
    currentvalue = list(prefix = "YEAR ", font = list(color="red")) )

## part 2.2
filter(da,Country=="US" | Country=="China") %>%
  plot_ly(x = ~Coal, y = ~Oil,size = ~Marker.size, 
          color = ~Country, frame = ~Year, text = ~Country, 
          hoverinfo = "text",type = 'scatter',mode = 'markers') %>% 
  animation_opts(1000, easing ="elastic",redraw = FALSE) %>% 
  animation_slider(currentvalue = list(prefix = "YEAR ", font = list(color="red")) )

## part 2.3
da3<-da
da3$prop<-da3$Oil/(da3$Oil+da3$Coal)

nyda1<-da3[,-c(3:5)]
nyda2<-nyda1
nyda2$prop<-0
nyda<-rbind(nyda1,nyda2)
nyda<-nyda[order(nyda$Country,nyda$Year,nyda$prop),]

g<-nyda %>% 
    plot_ly(x=~Country,y=~prop,frame=~Year,color=~Country,size=50,
            text=~prop,hoverinfo="text") %>%
    add_lines()
g
#nyda %>%
  #plot_ly(x = ~prop, y =~reorder(Country,prop),color = ~Country, 
  #        frame = ~Year, type = 'bar',orientation='h') %>% 
  #animation_opts(1000, easing ="elastic",redraw = FALSE) %>% 
  #animation_slider(
    #currentvalue = list(prefix = "YEAR ", font = list(color="red")) )
## part 2.4
g %>% animation_opts(1000, easing ="elastic",redraw = FALSE)
## part 2.5
set.seed(12345)

#A modified code from course website
temp<-da[,c(1:3)]
countries<-levels(factor(temp$Country))
years<-levels(factor(temp$Year))
da5<-as.data.frame(matrix(0,nrow=length(years),ncol=length(countries)))
rownames(da5)<-years
colnames(da5)<-countries
for(i in countries) 
  for(j in years)
    da5[j,i]<-temp[which(temp$Country==i & temp$Year==j),3]

da5<-rescale(da5)
steps <- c(0, rep(1/15, 200))
stepz <- cumsum(steps)

tour <- new_tour(da5, grand_tour(), NULL)

Projs<-lapply(steps, function(step_size){  
  step <- tour(step_size)
  if(is.null(step)) {
    .GlobalEnv$tour<- new_tour(da5, guided_tour(cmass), NULL)
    step <- tour(step_size)
  }
  step}    )

# projection of each observation
tour_dat <- function(i) {
  step <- Projs[[i]]
  proj <- center(da5 %*% step$proj)
  data.frame(x = proj[,1], y = proj[,2], state = rownames(da5))
}

# projection of each variable's axis
proj_dat <- function(i) {
  step <- Projs[[i]]
  data.frame(
    x = step$proj[,1], y = step$proj[,2], variable = colnames(da5)
  )  }



# tidy version of tour data

tour_dats <- lapply(1:length(steps), tour_dat)
tour_datz <- Map(function(x, y) cbind(x, step = y), tour_dats, stepz)
tour_dat <- dplyr::bind_rows(tour_datz)

# tidy version of tour projection data
proj_dats <- lapply(1:length(steps), proj_dat)
proj_datz <- Map(function(x, y) cbind(x, step = y), proj_dats, stepz)
proj_dat <- dplyr::bind_rows(proj_datz)

ax <- list(
  title = "", showticklabels = FALSE,
  zeroline = FALSE, showgrid = FALSE,
  range = c(-1.1, 1.1)
)

# for nicely formatted slider labels
options(digits = 3)
tour_dat <- highlight_key(tour_dat, ~state, group = "A")
tour <- proj_dat %>%
  plot_ly(x = ~x, y = ~y, frame = ~step, color = I("black")) %>%
  add_segments(xend = 0, yend = 0, color = I("gray80")) %>%
  add_text(text = ~variable) %>%
  add_markers(data = tour_dat, text = ~state, ids = ~state, hoverinfo = "text") %>%
  layout(xaxis = ax, yaxis = ax)#%>%animation_opts(frame=0, transition=0, redraw = F)
tour

print("The optimal projection matrix is: ")
Projs[[13.3]]$proj

da %>% filter(da$Country==c("China","Germany","United Kingdom")) %>% plot_ly(x=~Year,y=~Coal,color=~Country) %>% add_lines()